Skip to content

SVM optimization problem

minw,b12||w||2subject to yi(wTXi+b)1,i=1,,myi(wTxi+b)1,i=1,2,,m
  • We can write the constraints as

    gi(w)=1yi(wTxi+b)0
  • When we construct the Lagrangian for our optimization problem, we have:

    L(w,b,α)=12w2+i=1mαi[1yi(wTxi+b)]
  • Let’s find the dual form of the problem.

    • First minimize L(w,b,α) with respect to w and b (for fixed α), to get θD(α).
  • We’ll do this by setting the derivatives of L with respect to w and b to zero:

    wL(w,b,α)=wi=1mαiyixi=0bL(w,b,α)=i=1mαiyi=0
  • We have: $$w = \sum_{i=1}^m \alpha_i y_i x_i$$ and $$\sum_{i=1}^m \alpha_i y_i = 0$$. Plugging back into the Lagrangian equation:

    L(w,b,α)=12w2+i=1mαi[1yi(wTxi+b)]=i=1mαi12i=1mj=1mαiαjyiyjxiTxjbi=1mαiyi=i=1mαi12i=1mj=1mαiαjyiyjxiTxj

Hard SVM

Hyperplane: H={w|wTx+b=0}Constraint: yi(wTxi+b)1 iGoal: min12||w||2 s.t. yi(wtxi+b)1Lagrangian:L(w,b,α)=12||w||2iαi(yi(wTxi+b)1),αi0Partial derivative: Lw=wiαiyixi=0 Lb=iαiyi=0Solution: ||w||2=(iαiyixi)T(iαiyixi)=ijαiαjyiyjxiTxjLagrangian becomes: L=iαi12ijαiαjyiyjxiTxjs.t. iαiyi=0 and αi0iWeight vector: w=iαiyixiBias: b=yiiαiyixiTxj

Soft SVM

Hyperplane: H={w|wTx+b=0}Constraint: yi(wTxi+b)1ξi,ξi0,iGoal: min12||w||2+Ci=1nξi,s.t.yi(wTxi+b)1ξi,ξi0
Lagrangian: L(w,b,α,ξ)=12||w||2+Ci=1nξii=1nαi(yi(wTxi+b)1+ξi)i=1nμiξi,αi,μi0Partial Derivative: Lw=wi=1nαiyixi=0,Lb=i=1nαiyi=0,Lξi=Cαiμi=0Solution: ||w||2=i=1nj=1nαiαjyiyjxiTxjDual Problem: L=maxαi=1nαi12i=1nj=1nαiαjyiyjxiTxjs.t. i=1nαiyi=0,0αiC

Weight vector: w=i=1nαiyixiBias: b=yki=1nαiyixiTxkfor any 0<αk<C

The reason that ξ disappears: The slack variables ξi disappear in the dual problem because they are implicitly handled through the Lagrange multipliers αi. By taking the derivative of the Lagrangian with respect to ξi, we obtain:Lξi=Cαiμi=0 This relationship ensures that αi is bounded by 0αiC. Consequently, the slack variables αi do not explicitly appear in the dual formulation. Instead, the dual problem balances maximizing the margin and allowing for misclassification through the constraint on αi.

Kernel SVM

Hyperplane: H={w|wTϕ(x)+b=0}Constraint: yi(wTϕ(xi)+b)1ξi,ξi0,iGoal: min12||w||2+Ci=1nξi,s.t.yi(wTϕ(xi)+b)1ξiLagrangian (Dual): L(α)=i=1nαi12i=1nj=1nαiαjyiyjK(xi,xj)s.t. i=1nαiyi=0,0αiC,iWeight vector: w=i=1nαiyiϕ(xi)Decision Function: f(x)=sign(i=1nαiyiK(xi,x)+b)Bias: b=yki=1nαiyiK(xi,xk)sup vec 0<αk<CKernel Functions:
Linear: K(xi,xj)=xiTxj
Polynomial: K(xi,xj)=(xiTxj+c)d
Gaussian (RBF): K(xi,xj)=exp(||xixj||22σ2)
Sigmoid: K(xi,xj)=tanh(κxiTxj+c)